Chain rule

Topics in Calculus
Fundamental theorem
Limits of functions
Continuity
Mean value theorem

In calculus, the chain rule is a formula for the derivative of the composition of two functions.

In intuitive terms, if a variable, y, depends on a second variable, u, which in turn depends on a third variable, x, that is y = y(u(x)) , then the rate of change of y with respect to x can be computed as the rate of change of y with respect to u multiplied by the rate of change of u with respect to x. Schematically,

\frac {\mathrm dy}{\mathrm dx} = \frac {\mathrm dy} {\mathrm du} \cdot\frac {\mathrm du}{\mathrm dx}.\,\!

Contents

Informal discussion

For an explanation of notation used in this section, see Function composition.

The chain rule states that, under appropriate conditions,

 (f \circ g)'(x) = f'(g(x))g'(x),\,\!

which in short form is written as

 (f \circ g)' = (f'\circ g) \cdot g'\,\! .

Alternatively, in the Leibniz notation, the chain rule is

\frac {\mathrm dy}{\mathrm dx} = \frac {\mathrm dy} {\mathrm du} \cdot\frac {\mathrm du}{\mathrm dx}.\,\!

The chain rule can be applied to as many composed functions as needed:

 (f_1 \circ f_2 \circ f_3 \circ \cdots \circ f_n)'(x) = f'_1 \circ f_2 \circ f_3 \circ \cdots \circ f_n(x) \cdot f'_2 \circ f_3 \circ \cdots \circ f_n(x) \cdot f'_3 \circ \cdots \circ f_n(x) \cdot \,\cdots\, \cdot  f'_n(x).

In integration, the counterpart to the chain rule is the substitution rule.

Theorem

The chain rule in one variable may be stated more completely as follows.[1] Let g be a real-valued function on [a,b] which is differentiable at c ∈ [a,b]; and suppose that f is a real-valued function defined on an interval I containing the range of g and suppose further that g(c) is an interior point of I. If f is differentiable at g(c), then

Examples

Example I

Suppose that a mountain climber ascends at a rate of 0.5 kilometers per hour. The temperature is lower at higher elevations; suppose the rate by which it decreases is 6 °C per kilometer. To calculate the decrease in air temperature per hour that the climber experiences, one multiplies 6 °C per kilometer by 0.5 kilometer per hour, to obtain 3 °C per hour. This calculation is a typical chain rule application.

Example II

Consider the function f(x) = (x2 + 1)3. It follows from the chain rule that


\begin{align}
f(x) & = (x^2+1)^3 \\
u & = x^2+1 \\
f(x) & = u^3 \\
f'(x) & = 3u^2(u)' \\
f'(x) & = 3(x^2+1)^2(x^2+1)' \\
f'(x) & = 3(x^2+1)^2(2x) \\
\end{align}

In order to differentiate the trigonometric function

f(x) = \sin(x^2),\,

one can write:


\begin{align}
f(x) & = \sin(x^2) \\
u & = x^2 \\
f(x) & = \sin(u) \\
f'(x) & = \cos(u)(u)' \\
f'(x) & = \cos(x^2)(x^2)' \\
f'(x) & = \cos(x^2)(2x) \\
\end{align}

Example III

Differentiate arctan(sin x).

\frac{\mathrm d}{\mathrm dx}\arctan x = \frac{1}{1+x^2} \,\!

Thus, by the chain rule,

\frac{\mathrm d}{\mathrm dx}\arctan f(x) = \frac{f'(x)}{1+f^2(x)}\,\! ,

and in particular,

\frac{\mathrm d}{\mathrm dx}\arctan(\sin x) = \frac{\cos x}{1+\sin^2 x}\,\! .

Example IV

An illuminating exercise is to compute the derivatives of functions that one already knows, but use the chain rule instead.

Example 4.1
\,\frac{\mathrm d}{\mathrm dx} x = 1

Rewriting x as \,\mathrm e^{\ln(x)}, we have

\,\frac{\mathrm d}{\mathrm dx} \mathrm e^{\ln(x)} = \mathrm e^{\ln(x)}\frac{\mathrm d}{\mathrm dx}{\ln(x)} = \mathrm e^{\ln(x)}\frac{1}{x} = x\cdot \frac{1}{x} = 1
Example 4.2
\,\frac{\mathrm d}{\mathrm dx} x = 1

Rewriting x as \,\ln(\mathrm e^x), we have

\,\frac{\mathrm d}{\mathrm dx} \ln(\mathrm e^x) = \frac{1}{\mathrm e^x}\frac{\mathrm d}{\mathrm dx}\mathrm e^x = \frac{1}{\mathrm e^x}\mathrm e^x = 1
Example 4.3
\,\frac{\mathrm d}{\mathrm dx} x^6 = 6x^5

Rewriting \, x^6 as \, ({x^2})^{3}, we have

\,\frac{\mathrm d}{\mathrm dx}  {(x^2)}^{3} = 3(x^2)^2 \cdot \frac{\mathrm d}{\mathrm dx}x^2 = 3(x^2)^2 \cdot 2x = 6x^4 \cdot x = 6x^5
Example 4.4
\,\frac{\mathrm d}{\mathrm dx} x = 1

Rewriting x as \,\arccos(\cos(x)), we have

\,\frac{\mathrm d}{\mathrm dx} \arccos(\cos(x))  = {-1 \over \sqrt{1 - \cos^2(x)}} \frac{\mathrm d}{\mathrm dx} \cos(x)

= {+1 \over \sqrt{1 - \cos^2(x)}}\cdot \sin(x) \cdot \frac{\mathrm dx}{\mathrm dx}
= {1 \over \sqrt{ \sin^2(x)}} \cdot \sin(x)
= {1 \over  \sin(x)} \cdot \sin(x) = 1

In this example, one has to be careful about the domain and range, but we can pretend we are considering only a microscopic portion of the graph.

Chain rule for several variables

The chain rule works for functions of more than one variable.[2] Consider the function z = f(xy) where xg(t) and yh(t), and g(t) and h(t) are differentiable with respect to t, then

{\ \mathrm dz \over \mathrm dt}={\partial z \over \partial x}{\mathrm dx \over \mathrm dt}+{\partial z \over \partial y}{\mathrm dy \over \mathrm dt}\,\! .

Suppose that each argument of zf(uv) is a two-variable function such that uh(xy) and v = g(xy), and that these functions are all differentiable. Then the chain rule would look like:

{\partial z \over \partial x}={\partial z \over \partial u}{\partial u \over \partial x}+{\partial z \over \partial v}{\partial v \over \partial x}\,\! ,
{\partial z \over \partial y}={\partial z \over \partial u}{\partial u \over \partial y}+{\partial z \over \partial v}{\partial v \over \partial y}\,\! .

If we consider

\vec r = (u,v)

above as a Cartesian vector function, we can use vector notation to write the above equivalently as the dot product of the gradient of f and a derivative of \vec r:

\frac{\partial z}{\partial x}=\vec \nabla f(u,v) \cdot \frac{\partial \vec r}{\partial x}.

More generally, for functions of vectors to vectors, the chain rule says that the Jacobian matrix of a composite function is the product of the Jacobian matrices of the two functions:

\frac{\partial(z_1,\ldots,z_m)}{\partial(x_1,\ldots,x_p)} = \frac{\partial(z_1,\ldots,z_m)}{\partial(y_1,\ldots,y_n)} \frac{\partial(y_1,\ldots,y_n)}{\partial(x_1,\ldots,x_p)}\,\! .
Example

Given \,u = x^2 + 2y where \,x = r\sin(t) and \,y = \sin^2(t), determine the value of {\partial u \over \partial r} and {\partial u \over \partial t} using the chain rule.

{\partial u \over \partial r}=\frac{\partial u}{\partial x}\frac{\partial x}{\partial r}+\frac{\partial u}{\partial y}\frac{\partial y}{\partial r} = \left(2x\right)\left(\sin(t)\right)+\left(2\right)\left(0\right)=2r\sin^2(t)

and

{\partial u \over \partial t}=\frac{\partial u}{\partial x}\frac{\partial x}{\partial t}+\frac{\partial u}{\partial y}\frac{\partial y}{\partial t} = \left(2x\right)\left(r\cos(t)\right)+\left(2\right)\left(2\sin(t)\cos(t)\right)=
 2\left(r\sin(t)\right)r\cos(t)+4\sin(t)\cos(t) = 2\left(r^2+2\right)\sin(t)\cos(t).

Proof of the chain rule

Let f and g be functions and let x be a number such that f is differentiable at g(x) and g is differentiable at x. Then by the definition of differentiability,

 g(x+\delta)-g(x)= \delta g'(x) + \epsilon(\delta)\delta \,\!

where ε(δ) → 0 as δ → 0. Similarly,

 f(g(x)+\alpha) - f(g(x)) = \alpha f'(g(x)) + \eta(\alpha)\alpha \,\!

where η(α) → 0 as α → 0. Define also[3] that

 \eta(0) = 0 \,\!

Now

 f(g(x+\delta))-f(g(x))\,\!  = f(g(x) + \delta g'(x)+\epsilon(\delta)\delta) - f(g(x)) \,\!
 = \alpha_\delta f'(g(x)) + \eta(\alpha_\delta)\alpha_\delta \,\!

where

\alpha_\delta = \delta g'(x) + \epsilon(\delta)\delta. \,\!

Observe that as δ → 0, αδ / δg′(x) and αδ → 0, and thus η(αδ) → 0. It follows that

 \frac{f(g(x+\delta))-f(g(x))}{\delta} \to g'(x)f'(g(x))\mbox{ as } \delta \to 0\,\! .

To prove the multivariate chain rule, we will deal with the case of functions of two variables; a similar proof can be constructed for functions of three or more variables. Let x(t), y(t) be differentiable functions of t and assume f(x, y) has a gradient. If we set \,\Delta x = x(t + h) - x(t) and \,\Delta y = y(t + h) - y(t), then we have:

f'(x(t), y(t)) = \lim_{h\rightarrow 0} \frac{f(x(t + h), y(t + h)) - f(x(t), y(t))}{h}\,\!
= \lim_{h\rightarrow 0} \frac{f(x + \Delta x, y + \Delta y) - f(x, y + \Delta y) + f(x, y + \Delta y) - f(x, y)}{h}\,\!
= \lim_{h\rightarrow 0} \frac{f(x + \Delta x, y + \Delta y) - f(x, y + \Delta y)}{h} + \lim_{h\rightarrow 0} \frac{f(x, y + \Delta y) - f(x, y)}{h}\,\! .

When x is constant, we can regard \,f(x, y) as a function \,f_{x}(y) of \,y. Thus the limit on the right is equal to the derivative of \,f_{x}(y(t)), which by the single variable chain rule is \,\frac{\partial f}{\partial y} \frac{\mathrm dy}{\mathrm dt}\,\! .

To calculate the limit on the left, regard \,f(x, y + \Delta y) as a function \,f_{y + \Delta y}(x) of \,x. By the mean value theorem, we can select a real number \,c \in [x, x + \Delta x] such that the numerator on the left limit is equal to \,\Delta x  \frac{\mathrm df_{y + \Delta y}}{\mathrm dx}(c)\,\! . So the left limit is equal to \lim_{h\rightarrow 0}\frac{\Delta x}{h}\frac{\mathrm df_{y + \Delta y}}{\mathrm dx}(c), which equals \frac{\partial f}{\partial x} \frac{\mathrm dx}{dt}.\,\!

Thus, it follows that

f'(x(t), y(t)) = \frac{\partial f}{\partial x}\frac{\mathrm dx}{\mathrm dt} + \frac{\partial f}{\partial y}\frac{\mathrm dy}{\mathrm dt} = \nabla f(x, y) \cdot (x', y')\,\! .

The fundamental chain rule

The chain rule is a fundamental property of all definitions of derivatives and is therefore valid in much more general contexts. For instance, if E, F and G are Banach spaces (which includes Euclidean space) and f : EF and g : FG are functions, and if x is an element of E such that f is differentiable at x and g is differentiable at f(x), then the derivative (the Fréchet derivative) of the composition g o f at the point x is given by

\mbox{D}_x\left(g \circ f\right) = \mbox{D}_{f\left(x\right)}\left(g\right) \circ \mbox{D}_x\left(f\right).

Note that the derivatives here are linear maps and not numbers. If the linear maps are represented as matrices (namely Jacobians), the composition on the right hand side turns into a matrix multiplication.

A particularly clear formulation of the chain rule can be achieved in the most general setting: let M, N and P be Ck manifolds (or even Banach-manifolds) and let

f : MN and g : NP

be differentiable maps. The derivative of f, denoted by df, is then a map from the tangent bundle of M to the tangent bundle of N, and we may write

\mbox{d}\left(g \circ f\right) = \mbox{d}g \circ \mbox{d}f.

In this way, the formation of derivatives and tangent bundles is seen as a functor on the category of C manifolds with C maps as morphisms.

Tensors and the chain rule

See tensor field for an advanced explanation of the fundamental role the chain rule plays in the geometric nature of tensors.

Higher derivatives

Faà di Bruno's formula generalizes the chain rule to higher derivatives. The first few derivatives are

\frac{\mathrm d (f \circ g) }{\mathrm dx} = \frac{\mathrm df}{\mathrm dg}\frac{\mathrm dg}{\mathrm dx}

  \frac{\mathrm d^2 (f \circ g) }{\mathrm d x^2}
  = \frac{\mathrm d^2 f}{\mathrm d g^2}\left(\frac{\mathrm dg}{\mathrm dx}\right)^2
    + \frac{\mathrm df}{\mathrm dg}\frac{\mathrm d^2 g}{\mathrm dx^2}

  \frac{\mathrm d^3 (f \circ g) }{\mathrm d x^3}
  = \frac{\mathrm d^3 f}{\mathrm d g^3} \left(\frac{\mathrm dg}{\mathrm dx}\right)^3 
    + 3 \frac{\mathrm d^2 f}{\mathrm d g^2} \frac{\mathrm dg}{\mathrm dx} \frac{\mathrm d^2 g}{\mathrm d x^2}
    + \frac{\mathrm df}{\mathrm dg} \frac{\mathrm d^3 g}{\mathrm d x^3}

  \frac{\mathrm d^4 (f \circ g) }{\mathrm d x^4}
  =\frac{\mathrm d^4 f}{\mathrm dg^4} \left(\frac{\mathrm dg}{\mathrm dx}\right)^4 
    + 6 \frac{\mathrm d^3 f}{\mathrm d g^3} \left(\frac{\mathrm dg}{\mathrm dx}\right)^2 \frac{\mathrm d^2 g}{\mathrm d x^2} 
    + \frac{\mathrm d^2 f}{\mathrm d g^2} \left\{ 4 \frac{\mathrm dg}{\mathrm dx} \frac{\mathrm d^3 g}{\mathrm dx^3} + 3\left(\frac{\mathrm d^2 g}{\mathrm dx^2}\right)^2\right\}
      
    + \frac{\mathrm df}{\mathrm dg}\frac{\mathrm d^4 g}{\mathrm dx^4}.

See also

References

  1. Apostol, Tom (1974). Mathematical analysis (2nd ed. ed.). Addison Wesley. Theorem 5.5. 
  2. "The Multivariable Chain Rule". http://www.math.hmc.edu/calculus/tutorials/multichainrule/. Retrieved 2009-11-06. 
  3. To see that this is needed, suppose for example that g is a constant function.

External links